Goto

Collaborating Authors

 information term




Information Subtraction: Learning Representations for Conditional Entropy

Leong, Keng Hou, Xiu, Yuxuan, Kin, Wai, Chan, null

arXiv.org Artificial Intelligence

We may consider the observations as samples from stochastic distributions and use informationtheoretic measures, as shown in Figure 1, to quantify the uncertainty and shared information among variables. These measures reveal the strength of relationships, including correlation and Granger causality between variables(Pearl 2009). Beyond merely recognizing the magnitude of such relationships, many representation learning works aim to further explain and describe them to enhance our understanding and control over the system (Yao et al. 2021; Xu et al. 2023). These approaches generate representations that maximize information about the targets, as they must be capable of accurately reconstructing the targets (Kingma and Welling 2013; Clark et al. 2019). Therefore, most methods are capable of effectively represent entropy H(Y) or mutual information I(X;Y), which describes the total information of Y and the shared information between X and Y, respectively, as shown in Figure 1. However, fewer methods have addressed the representation of other information terms such as conditional entropy H(Y |X) and conditional mutual information I(X;Y |W), which describes the information in Y not provided by X, and the information that X provides to Y but W does not, respectively. The representation of conditional mutual information is significant as it reveals the distinct impact of a specific factor on the target, which other factors do not provide. For example, identifying the distinct effect of funding on a scholar's publications, seperate from other factors, can guide policy decisions such as terminating funding that shows insignificant boosting. Furthermore, representing conditional entropy helps in creating fair and unbiased representations by removing the impact of sensitive factors.


Disentanglement Analysis with Partial Information Decomposition

Tokui, Seiya, Sato, Issei

arXiv.org Machine Learning

When we recognize objects, sounds, sentences, or whatever sensible, we quickly comprehend how it differs from others in properties that may individually vary across instances, such as color, shape, texture, pitch, rhythm, writing style, tone, etc. Such interpretable factors of variation are useful to understand what constitutes the variations of data and to manipulate data generation when a generative process is available. Disentanglement is a guiding principle of designing a learned representation separable into parts individually capture the underlying factors of variation. The concept is originally concerned as an inductive bias in representation learning towards obtaining representations aligned with the underlying factors of variation in data (Bengio et al., 2013) and has been applied to controlling otherwise unstructured representations of data from several domains, e.g., images (Karras et al., 2019; Esser et al., 2019), text (Hu et al., 2017), and audio (Hsu et al., 2019) to name just a few. While the concept is appealing, a concrete definition of disentanglement is not trivial. Most of the existing studies after Higgins et al. (2017) proposed generative learning methods that encourage latent variables to be marginally independent from each other; however, it is still not clear if that is the ultimate direction for better disentanglement (Higgins et al., 2018). To understand disentanglement, it is crucial to design disentanglement metrics that measure how representations disentangle the true generative factors, as it is not trivial as well to define such metrics (Higgins et al., 2017; Kim & Mnih, 2018; Chen et al., 2018; Eastwood & Williams, 2018).


A Note on Semantic Web Services Specification and Composition in Constructive Description Logics

Bozzato, Loris, Ferrari, Mauro

arXiv.org Artificial Intelligence

The idea of the Semantic Web is to annotate Web content and services with computer interpretable descriptions with the aim to automatize many tasks currently performed by human users. In the context of Web services, one of the most interesting tasks is their composition. In this paper we formalize this problem in the framework of a constructive description logic. In particular we propose a declarative service specification language and a calculus for service composition. We show by means of an example how this calculus can be used to define composed Web services and we discuss the problem of automatic service synthesis.


Learning Graphical Models with Mercer Kernels

Bach, Francis R., Jordan, Michael I.

Neural Information Processing Systems

We present a class of algorithms for learning the structure of graphical models from data. The algorithms are based on a measure known as the kernel generalized variance (KGV), which essentially allows us to treat all variables on an equal footing as Gaussians in a feature space obtained from Mercer kernels. Thus we are able to learn hybrid graphs involving discrete and continuous variables of arbitrary type. We explore the computational properties of our approach, showing how to use the kernel trick to compute the relevant statistics in linear time. We illustrate our framework with experiments involving discrete and continuous data.


Learning Graphical Models with Mercer Kernels

Bach, Francis R., Jordan, Michael I.

Neural Information Processing Systems

We present a class of algorithms for learning the structure of graphical models from data. The algorithms are based on a measure known as the kernel generalized variance (KGV), which essentially allows us to treat all variables on an equal footing as Gaussians in a feature space obtained from Mercer kernels. Thus we are able to learn hybrid graphs involving discrete and continuous variables of arbitrary type. We explore the computational properties of our approach, showing how to use the kernel trick to compute the relevant statistics in linear time. We illustrate our framework with experiments involving discrete and continuous data.


Learning Graphical Models with Mercer Kernels

Bach, Francis R., Jordan, Michael I.

Neural Information Processing Systems

We present a class of algorithms for learning the structure of graphical models from data. The algorithms are based on a measure known as the kernel generalized variance (KGV), which essentially allows us to treat all variables on an equal footing as Gaussians in a feature space obtained from Mercer kernels. Thus we are able to learn hybrid graphs involving discrete and continuous variables of arbitrary type. We explore the computational properties of our approach, showing how to use the kernel trick to compute the relevant statistics in linear time. We illustrate our framework with experiments involving discrete and continuous data.